Inferring Haplotypes from genotypes on a Pedigree with mutations, genotyping Errors and Missing Alleles
نویسندگان
چکیده
Inferring the haplotypes of the members of a pedigree from their genotypes has been extensively studied. However, most studies do not consider genotyping errors and de novo mutations. In this paper, we study how to infer haplotypes from genotype data that may contain genotyping errors, de novo mutations, and missing alleles. We assume that there are no recombinants in the genotype data, which is usually true for tightly linked markers. We introduce a combinatorial optimization problem, called haplotype configuration with mutations and errors (HCME), which calls for haplotype configurations consistent with the given genotypes that incur no recombinants and require the minimum number of mutations and errors. HCME is NP-hard. To solve the problem, we propose a heuristic algorithm, the core of which is an integer linear program (ILP) using the system of linear equations over Galois field GF(2). Our algorithm can detect and locate genotyping errors that cannot be detected by simply checking the Mendelian law of inheritance. The algorithm also offers error correction in genotypes/haplotypes rather than just detecting inconsistencies and deleting the involved loci. Our experimental results show that the algorithm can infer haplotypes with a very high accuracy and recover 65%-94% of genotyping errors depending on the pedigree topology.
منابع مشابه
Efficient Inference of Haplotypes from Genotypes on a Pedigree with Mutations and Missing Alleles (Extented Abstract)
Driven by the international HapMap project, the haplotype inference problem has become an important topic in the computational biology community. In this paper, we study how to efficiently infer haplotypes from genotypes of related individuals as given by a pedigree. Our assumption is that the input pedigree data may contain de novo mutations and missing alleles but is free of genotyping errors...
متن کاملGenotyping errors, pedigree errors, and missing data.
Our group studied the effects of genotyping errors, pedigree errors, and missing data on a wide range of techniques, with a focus on the role of single-nucleotide polymorphisms (SNPs). Half of our group used simulated data, and half of our group used data from the Collaborative Study on the Genetics of Alcoholism (COGA). The simulated data had no missing genotypes and no genotyping errors, so o...
متن کاملEfficient inference of haplotypes from genotypes on a large animal pedigree.
We present a simple algorithm for reconstruction of haplotypes from a sample of multilocus genotypes. The algorithm is aimed specifically for analysis of very large pedigrees for small chromosomal segments, where recombination frequency within the chromosomal segment can be assumed to be zero. The algorithm was tested both on simulated pedigrees of 155 individuals in a family structure of three...
متن کاملThe use of family relationships and linkage disequilibrium to impute phase and missing genotypes in up to whole-genome sequence density genotypic data.
A novel method, called linkage disequilibrium multilocus iterative peeling (LDMIP), for the imputation of phase and missing genotypes is developed. LDMIP performs an iterative peeling step for every locus, which accounts for the family data, and uses a forward-backward algorithm to accumulate information across loci. Marker similarity between haplotype pairs is used to impute possible missing g...
متن کاملHAPLORE: a program for haplotype reconstruction in general pedigrees without recombination
MOTIVATION Haplotype reconstruction is an essential step in genetic linkage and association studies. Although many methods have been developed to estimate haplotype frequencies and reconstruct haplotypes for a sample of unrelated individuals, haplotype reconstruction in large pedigrees with a large number of genetic markers remains a challenging problem. METHODS We have developed an efficient...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of bioinformatics and computational biology
دوره 9 2 شماره
صفحات -
تاریخ انتشار 2011